Advertisement: Support JavaWorld, click here!
April 1999
HOME FEATURED TUTORIALS COLUMNS NEWS & REVIEWS FORUM JW RESOURCES ABOUT JW






ARCHIVE

TOPICAL INDEX
Core Java
Enterprise Java
Micro Java
Applied Java
Java Community

JAVA Q&A INDEX

JAVA TIPS INDEX

JavaWorld Services

Free JavaWorld newsletters

ProductFinder

Education Resources

White Paper Library

NEW! Rational Resources


XML for the absolute beginner

A guided tour from HTML to processing XML with Java


Printer-friendly version Printer-friendly version | Send this article to a friend Mail this to a friend


Page 2 of 10

Advertisement

HTML: All form and no substance
HTML is a language designed to "talk about" documents: headings, titles, captions, fonts, and so on. It's heavily document structure- and presentation-oriented.

Admittedly, artists and hackers have been able to work miracles with the relatively dull tool called HTML. But HTML has serious drawbacks that make it a poor fit for designing flexible, powerful, evolutionary information systems. Here a few of the major complaints:

  • HTML isn't extensible
    An extensible markup language would allow application developers to define custom tags for application-specific situations. Unless you're a 600-pound gorilla (and maybe not even then) you can't require all browser manufacturers to implement all the markup tags necessary for your application. So, you're stuck with what the big browser makers, or the W3C (World Wide Web Consortium) will let you have. What we need is a language that allows us to make up our own markup tags without having to call the browser manufacturer.

  • HTML is very display-centric
    HTML is a fine language for display purposes, unless you require a lot of precise formatting or transformation control (in which case it stinks). HTML represents a mixture of document logical structure (titles, paragraphs, and such) with presentation tags (bold, image alignment, and so on). Since almost all of the HTML tags have to do with how to display information in a browser, HTML is useless for other common network applications -- like data replication or application services. We need a way to unify these common functions with display, so the same server used to browse data can also, for example, perform enterprise business functions and interoperate with legacy systems.

  • HTML isn't usually directly reusable
    Creating documents in word-processors and then exporting them as HTML is somewhat automated but still requires, at the very least, some tweaking of the output in order to achieve acceptable results. If the data from which the document was produced change, the entire HTML translation needs to be redone. Web sites that show the current weather around the globe, around the clock, usually handle this automatic reformatting very well. The content and the presentation style of the document are separated, because the system designers understand that their content (the temperatures, forecasts, and so on) changes constantly. What we need is a way to specify data presentation in terms of structure, so that when data are updated, the formatting can be "reapplied" consistently and easily.

  • HTML only provides one 'view' of data
    It's difficult to write HTML that displays the same data in different ways based on user requests. Dynamic HTML is a start, but it requires an enormous amount of scripting and isn't a general solution to this problem. (Dynamic HTML is discussed in more detail below.) What we need is a way to get all the information we may want to browse at once, and look at it in various ways on the client.

  • HTML has little or no semantic structure
    Most Web applications would benefit from an ability to represent data by meaning rather than by layout. For example, it can be very difficult to find what you're looking for on the Internet, because there's no indication of the meaning of the data in HTML files (aside from META tags, which are usually misleading). Type red into a search engine, and you'll get links to Red Skelton, red herring, red snapper, the red scare, Red Letter Day, and probably a page or two of "Books I've Red." HTML has no way to specify what a particular page item means. A more useful markup language would represent information in terms of its meaning. What we need is a language that tells us not how to display information, but rather, what a given block of information is so we know what to do with it.

SGML has none of these weaknesses, but in order to be general, it's hair-tearingly complex (at least in its complete form). The language used to format SGML (its "style language"), called DSSSL (Document Style Semantics and Specification Language), is extremely powerful but difficult to use. How do we get a language that's roughly as easy to use as HTML but has most of the power of SGML?

Origins of XML
As the Web exploded in popularity and people all over the world began learning about HTML, they fairly quickly started running into the limitations outlined above. Heavy-metal SGML wonks, who had been working with SGML for years in relative obscurity, suddenly found that everyday people had some understanding of the concept of markup (that is, HTML). SGML experts began to consider the possibility of using SGML on the Web directly, instead of using just one application of it (again, HTML). At the same time, they knew that SGML, while powerful, was simply too complex for most people to use.

In the summer of 1996, Jon Bosak (currently online information technology architect at Sun Microsystems) convinced the W3C to let him form a committee on using SGML on the Web. He created a high-powered team of muckety-mucks from the SGML world. By November of that year, these folks had created the beginnings of a simplified form of SGML that incorporated tried-and-true features of SGML but with reduced complexity. This was, and is, XML.

In March 1997, Bosak released his landmark paper, "XML, Java and the Future of the Web" (see Resources). Now, two years later (a very long time in the life of the Web), Bosak's short paper is still a good, if dated, introduction to why using XML is such an excellent idea.

SGML was created for general document structuring, and HTML was created as an application of SGML for Web documents. XML is a simplification of SGML for general Web use.


Next page >
Page 1 XML for the absolute beginner
Page 2 HTML: All form and no substance
Page 3 An XML conceptual example
Page 4 Make up a markup
Page 5 So, what good is made-up markup?
Page 6 Cascading Style Sheets: not just for HTML anymore
Page 7 XSL: I like your style
Page 8 Modeling information structure in XML
Page 9 XML and Java
Page 10 Become a tree surgeon!

Printer-friendly version Printer-friendly version | Send this article to a friend Mail this to a friend



Advertisement: Support JavaWorld, click here!


HOME |  FEATURED TUTORIALS |  COLUMNS |  NEWS & REVIEWS |  FORUM |  JW RESOURCES |  ABOUT JW |  FEEDBACK

Copyright © 2003 JavaWorld.com, an IDG company